Hal Chamberlin is vice president of Research and Development for Micro Technology Unlimited, Box 12106, Raleigh, NC 27605. Active in electronic sound synthesis since 1966 and in computer music synthesis since 1970, he has authored numerous magazine articles and has recently published a book entitled Musical Applications of Microprocessors.
The use of microcomputers for teaching, composing, transcribing and playing music is rapidly becoming a major application area.
New synthesizer boards, music programs, and integrated systems with music capability are among the new products highlighted in
the microcomputer field. It is now common for university and even high school music departments to acquire a quantity of
microcomputers solely for musical purposes. It is even getting to the point where it is hard to find a microcomputer owner
without some kind of music program, even if it only plays kazoo-like music through a built-in two-inch speaker.
The Computer Music Field
Since any complete discussion of microcomputer music is impossible within the confines of a magazine format, this article deals
with a much narrower subject area. First, however, we need to characterize the field somewhat to see how the topics of musical
instrument simulation and software digital synthesis fit in.
Computer music systems cover a broad range of sophistication, application, capability, sound quality and cost. At one extreme,
we have the limited-range, tinny, one-voice, "gee whiz" type of system mentioned earlier that can be set up for a
dollar's worth of parts and a program whose listing would not even fill a page.
At the other extreme, we have experimental computer music systems in some universities that have a range beyond human perception,
quadraphonic sound quality exceeding that of the best recording equipment, virtually unlimited synthesis capabilities and
practically infinite voice count at a cost (if measured by industrial standards) in the millions. Using a microcomputer,
you can set up a system with reasonably wide range, good stereo sound quality, good synthesis flexibility and 32 voices for
a few thousand dollars. The important point is that there is a definite need for systems addressing these extremes and many
points in between.
The simple one-voice systems are certainly the most common and are fully adequate for teaching elementary music concepts as
well as for impressing friends and neighbors. In fact, they are probably preferred for getting started because their very
simplicity makes them easy to learn and use. Since only pitch and timing can be controlled, there are only two variables to
worry about. Harmony, timbre, envelope and dynamics are either absent or predetermined.
Note that this type of music system is easily implemented either purely in software using timed loops, which toggle an output
port bit, or through a combination of software and hard- ware where control bytes are sent to a simple divide-by-N counter which
may even be part of the I/O interface chip used by the computer. With this level of system, you either quickly outgrow it and
move on or are content to file the program alongside the Lunar Lander and Star Trek cassettes.
The next step up is generally either a synthesizer board or an inexpensive eight-bit digital-to-analog converter. The synthesizer
board is a set of several oscillators which at a minimum are programmable for pitch and amplitude. (There are a couple of very
sophisticated single-voice synthesizer boards, but they are intended to be used in multiples.!
The simplest type of synthesizer board has three square-wave oscillators with pitch and amplitude registers for each and sometimes
an overall volume control. Typically, these boards are implemented using programmable timer integrated circuits as the oscillators
(normally intended for use in process-control-oriented microcomputers) and discrete circuitry for the volume control function.
Recently, General Instrument introduced a synthesizer chip that has the three oscillators and the volume control circuitry
integrated on a single chip along with a noise generator useful for limited percussion effects. These chips, usually in trios
for a total of nine voices, are appearing in the latest batch of synthesizer boards. Prices range from a little over $100 to
nearly $300 with little connection between capability and price.
With these synthesizer boards, the computerist musician gains a great deal of flexibility, since he can play complex chords and
control dynamics and tone envelopes. As a result, these boards are a great deal more difficult to master, although you can choose
to ignore some of the variables initially.
There is one serious shortcoming, however: all synthesizer boards in this class produce square waves exclusively. (One three-voice
board on the market has the capability of combining two of the voices into a single variable width rectangular wave, which
increases the tonal variety somewhat.) Square waves have a rather sharp, yet hollow, sound that most closely resembles that of
a kazoo. By suitable control of the amplitude envelope, you can produce continuous organ-like tones and percussive plucked-like
tones, but the basic character of square waves remains.
With the proliferation of this type of board in recent months (and its constant demonstration at computer shows!, the public may
very well come to associate square wave sound with computers, just as a piano is associated with its own tone color. This would be
unfortunate indeed, since the ultimate value added by computers in music is a wider range of timbres than any other instrument.
Nevertheless, there is sufficient expressive power available so that the difference between a piece programmed by a novice and
one programmed by an experienced musician is readily apparent.
Music systems based on these synthesizer boards seem to satisfy many users whose goal is to learn music, enjoy transcribing music
into the computer and even perform simple composition. They typically will not satisfy a musician attempting to do serious
performance work with the computer.
Much more sophisticated synthesizer boards with programmable waveforms are also available in the $500 to $2000 price range. These
overcome the lack of tonal variety of the square wave boards by providing programmable waveforms, usually with provisions for a
different waveform for each voice.
An important consideration that will be discussed later is whether the board allows dynamically variable waveforms; that is, the
ability to smoothly alter the waveform while a note is being played with it. This is a requirement for many effects such as
the "wah" of a muted trombone, and, as might be expected, the less expensive units do not provide for it. In either case,
the tonal variety is far greater than the square wave units and is sufficient to satisfy many musicians as well as casual users.
On the other side of the hardware/software fence are music systems based on digital-to-analog converters (DAC). As we shall see
later, a digital-to-analog converter simply translates numbers into voltages. A very rapid string of numbers produces a rapidly
varying voltage; that is, an audio waveform.
In theory, appropriate software can calculate the necessary number sequence to produce literally any sound. The capabilities of a
music system based on a digital-to-analog converter are determined solely by the sophistication of the software involved rather
than the capabilities designed and frozen into a hardware synthesizer. DAC boards also tend to be less expensive. A good eight-bit
DAC board sells for less than $70, while an experimental home brew unit can be put together for half the price of a movie ticket.
The remainder of Part 1 will describe how sound generation software works in a DAC-based system.
Numbers to Sound
The fundamental principle behind digital sound synthesis is that a string of numbers from a computer program may be converted into
a high-fidelity audio signal. As you might expect, the rate at which the numbers are supplied and the precision of the numbers
both determine the fidelity of the resulting sound. In synthesis applications, fidelity's usual definition, i.e., faithfulness
to the original, does not apply I since there is no original. Instead, fidelity is used to refer to the frequency range that
can be produced and the relative freedom from undesired noise and distortion.
Fig. 1 shows how a DAC can produce a smooth audio waveform from a string of numbers and the errors in': volved. The grid in the
figures represents time in the horizontal direction and voltage in the vertical direction.
Fig. la shows a greatly magnified drawing of a small portion of a typical audio waveform. Notice that it wiggles and curves through
the figure without regard for the grid.
In Fig. 1 b we have the raw output of a DAC being fed the string of numbers representing the waveform in Fig. la.
Each vertical grid line represents the point in time that the DAC receives a new number; thus, it stands to reason that the DAC
output can only change up or down at vertical grid lines. Each horizontal grid line represents a possible numerical value that
the DAC can receive. For example, an eight-bit DAC can only accept 256 (28) different numbers, so the complete grid for such
a DAC would have 256 horizontal grid lines. As a result, the DAC output can only dwell at a horizontal grid line. Needless to
say, the smoothly curved waveform of
system speed. The important points are that frequency range and background noise level are independently adjustable system
parameters and that greater fidelity is accompanied by a higher data rate. Note that it is considerably less expensive in terms
of data rate to reduce the noise level than it is to increase the high-frequency limit. The two stars in Fig. 2 represent the
two software digital music synthesis system that will be discussed in this article.
Where the Numbers Come From
The real trick in a DAC-based music system, then, is to compute the string of numbers, or samples, repreenting the desired sound
and then send it to the DAC at the required ate. In all of the cases that will be considered here, the sample rate will be
constant because that assumption greatly simplifies the computations. Conversely, when the rate is assumed to be constant, it
must be to rather close tolerances to avoid excessive jitter noise.
At this point you can choose to go in either of two directions. In real-time digital synthesis, the samples are computed at the
rate required by the DAC and sent to it immediately. The advantage, of course, is that the sound is heard in its final form as
the program is running. The disadvantage is that practical sample rates are relatively high, which means that a very efficient
program using an uncomplicated synthesis technique running on a fast microcomputer is required.
The other choice is delayed playback digital synthesis, where the computed samples are first written into a mass storage device
at relatively low speed and then later reread and played through the DAC at the necessary high speed. The advantages here are
that the synthesis program can be more accurate (and thus slower!, any synthesis technique of any complexity can be utilized,
and the higher sample rates and DAC resolutions necessary for high fidelity can be utilized. The main disadvantages are a rather
long delay between program execution and audible results and the need for a large capacity, high-speed mass storage system.
It is also possible to combine the two philosophies-real time for composition and experimentation with the orchestration and
delayed playback for a high-fidelity final result.
waveform table entries does not contribute to distortion if the tabulated waveform conforms to certain rules that will be discussed
later.
Fig. 4 aids in understanding the scanning process. Here the example 16-point waveform table has been bent into a circle, which is
one way to view the wrap-around process mentioned earlier. The arrow represents the waveform table pointer, which contains the
contents of a machine register or memory location. The bracket represents 'the value of the waveform table increment, which
indicates how far the table pointer is advanced every 125 us sample period.
Thus, if the increment is one, the pointer will take on values of 0, 1, 2, . . ., 14, 15, 0, . . . (0-255 in real life) and give
us a low note. If the increment is 3, the pointer will go through the sequence 0, 3, 6, 9, 12, 15, 2, . . . and give us a
three-times-higher note. Thus, the increment is proportional to the pitch of the synthesized tone. Note that in this case successive
trips around the table are not exactly the same. Again, this does not lead to distortion if the waveform meets certain requirements.
Returning to the real case of a 256 point table, it is apparent that the frequency resolution of 31 Hz when using integral waveform
table increments is not sufficient for most musical applications. What is needed is the ability to specify an increment with a
fractional part such as 7.04 to produce a precise A below middle C. This is quite possible but requires that the waveform pointer
also take on a fractional part, which leads to a problem. How should the table be read when the pointer says "read the
78.645th entry"?
A sensible answer would be to look at both the 78th and 79th entries and then interpolate between them. Unfortunately, even simple
linear interpolation is fairly complex (requires a multiply), which means it is slow. For real-time digital synthesis on a
microcomputer, we will be forced to ignore the fractional part of the pointer when reading the table but include it when adding the
increment to compute the next value of the pointer. Taking this shortcut leads to a distortion called interpolation noise, which
is significant but generally tolerable.
Now how might a program segment be set up to manipulate the pointer, increment and table to generate sample values for the DAC?
Fig. 5 shows the arrangement of a waveform table, its pointer and its increment in memory. For illustration purposes, the waveform
table is assumed to be in memory from 3200-32FF, which is page 32, while the pointer and increment are kept in memory page zero for
fast access. The increment is a two-byte value with an integer byte and fraction byte as mentioned above. The decimal equivalent
of the increment value shown is 11.633. The pointer is actually a three-byte value.
The most significant byte is the page number (32) of the waveform table and normally remains constant but can be changed to select
a different waveform. The middle byte is the integer byte of 1he pointer into that table, while the least significant byte is
the fractional part of the pointer.
Every sample period (125 us) the increment is double-precision added to the integer and fractional parts of the pointer, and the
pointer is replaced with the result. Any overflow is simply ignored, since it is merely an indication of wrap-around from the end
to the beginning of the waveform table. Actual table lookup is extremely simple in the 6502; you simply use the rightmost two
bytes of the pointer (the waveform table page address and the integer part of the pointer) as the indirect address of an indirect
load instruction. Thus, only one instruction is needed to look up in the waveform table. The 6502 machine-language code shown
requires only 23 us to do all of this.
Since the 23 us figure is considerably less than the 125 us allowable, you can have several waveforms, pointers and increments
for several simultaneous tones. There is enough time to handle four tones with some left over for housekeeping, You could also
have fewer voices and a higher sample rate, or more and a lower rate.
There are two ways to combine the four table-lookup values into a single eight-bit value for the DAC. One is to simply add them up
and send the sum to the DAC, which is the equivalent of audio mixing. When this is done the waveform table values must have been
adjusted when the table was computed to avoid overflow (which can lead to horrendous distortion) when the four voices are added up.
The other method is to immediately send each value to the DAC when it is found and let the lowpass filter smear them together, thus
effecting mixing. One disadvantage of this approach is that the dwell time of each voice in the DAC must be the same or there
will be differences in loudness among the voices. Another disadvantage is that certain DAC distortions are accentuated, although
they are usually not significant at the eight-bit level. It is also a simple matter to have two DACsand direct two voices to each
for an approximation to stereo.
Listing 1 shows the core sound generation routine used in a digital synthesis program first published in 1977. It is capable of
generating four tones simultaneously, where each tone can use a different waveform table. It uses the "add-ern-up"
technique of mixing the four voices into a single sample value for the DAC. A separate routine is expected to store the appropriate
values in each of the four increments for the desired pitches and also set TEMPO and DUR for the desired duration of the chord.
Each time through the main loop takes 115 us and represents one sample period, thus the sample rate is 8.7 kHz. Also, each time
through the loop decrements a copy of TEMPO, which is held in the X register. When X decrements to zero, it is restored from TEMPO,
and DUR is decremented directly in memory. If DUR also decrements to zero, the chord is complete and a return to the setup routine
is taken. Thus, the total chord duration is proportional to the product of TEMPO and DUR. This property makes it possible to change
the speed of the music without recoding it.
Note the presence of time-equalizing instructions at TIMWAS so that the loop time is the same whether or not register X decrements
to zero. This is necessary to eliminate jitter distortion mentioned earlier. The setup routine would look at coded music in memory
to determine what successive values of the four increments, DUR and possibly TEMPO should be to produce the desired music.
Typically, music data would be set up in memory as a set of five bytes for each musical"event" (note or chord) in the
piece. The first byte would be the duration, while the other four would represent the desired pitch of each of the four voices.
A note frequency table would be used to determine the proper two-byte value of the increment from the one-byte pitch code. This
routine must also be as fast as possible because sound generation is stopped when it is in control. If the flow of samples is
stopped for too long, an objectionable click between notes is introduced. See references for a further explanation of the setup
routine. Next month we will continue our discussion of synthesizing multiple tones using waveform table data, explore the capabilities of
existing DAC software and examine some of the prospects for the future.
In Part 1 we devised a method of synthesizing multiple tones with any waveform desired. The question now becomes, "How do
you determine what samples to put into a waveform table?"
Perhaps the simplest method is to draw one cycle of the waveform on graph paper and then laboriously read off 256 sample values and
enter them into the table. The drawn shape could come from an oscilloscope photo of a musical instrument sound or from imagination.
The drawn shape must span exactly 256 grid lines in exactly one cycle to be valid. You could also make use of a light pen or
graphic digitizer in conjunction with a drawing program to do the same thing with much less effort.
The biggest problem, when using imagination is that there is no simple relation between the appearance of the drawn shape and
the resulting timbre. Thus, if a particular shape produces a sound that is close to what is desired, there is no way to know
what must be changed to make it sound even closer.
Filling the Waveform Tables
Probably the best way to fill waveform tables is to write a program that accepts harmonic specifications, computes the corresponding
wave-shape and automatically enters it into memory. There is a very definite correlation between the harmonic makeup of a tone and
its timbre. You can also occasionally find published harmonic analyses of musical instrument tones, particularly organ pipes.
Listing 1 shows a very simple BASIC program that can be used to create waveform table data and poke it directly into memory. The
statements starting at line 3000 first amplitude-normalize the waveform, convert the samples into integer form in the range
of 0 to 63 (to avoid overflow when four are added up) and then poke them into memory.
The biggest advantage of using harmonics to specify waveforms is that alias distortion can be readily avoided. Alias distortion
occurs whenever any frequency component of a waveform exceeds one-half of the sampling frequency. This can easily happen with high
notes using waveforms rich in harmonics.
For example, if you attempt to play high C (523 Hz) using a waveform with ten significant harmonics through an 8 kHz sample rate
system, the eighth, ninth and tenth harmonics will alias, since they will be 4184, 4707 and 5230 Hz, respectively, all above four kHz.
Aliasing means that intended frequencies are altered and usually produces an objectionably harsh sound. Thus, waveform tables used
to play high notes should have their upper harmonics restricted, while those for low notes may have dozens of significant harmonics
if desired.
Musical Instrument Synthesis.
After some experimentation with different waveforms and types of music, you will discover that a wide variety of tone colors is
possible, but the tones always sound like an organ. Ofcourse, the organ is the most versatile of conventional musical instruments,
but digital synthesis should be able to do better. One of the reasons for an organ-like sound is that only continuous, sustained
tones can be generated by simple waveform table scanning. In other words, the ampli tude envelope is rectangular, as shown
in Fig. la. Many instruments have other shapes, such as those in Figs. lb, lc and ld.
The standard method of adding an amplitude envelope to a sound is to pass it through a variable-gain amplifier and vary the gain
in accordance with the desired envelope shape. In digital synthesis this is equivalent to multiplying the samples representing
the sound by an amplitude factor that changes as the note progresses. The series of amplitude factors could come from an envelope
table that is scanned just like the waveform table but much more slowly.
Adding overall envelope control certainly improves the variety of sounds available and is frequently enough to give reasonable
simulations of common musical instruments. However, rather than spending a lot of time explaining how overall envelope control
can be added to a table-scanning digital synthesis sys tem (which mainly involves methods for eliminating time-consuming
multiplication), let's go all the way and include timbre envelopes as well.
To some extent the sound of all in- struments changes its waveform during the course of a note. Consider, for example,
the' 'waaahhh" of a muted trombone or the "twaanng" of a guitar. The change in character of the sound during
the notes is what makes these instrument sounds so distinctive. In terms of synthesizing these and similar sounds, it is
the harmonic composition, as well as the overall amplitude, of the waveform that changes gradually.
The standard method of adding a timbre envelope to a sound is to pass it through a variable filter and vary the cutoff or
center frequency and Q factor in accordance with the desired effect. In digital synthesis you have to use a digital filter,
which involves several multiplications per sound sample. This is just not practical in a realtime microcomputer-based system,
so some other method must be found. But first we need a way to visualize timbre envelopes so that they can be specified.
Fig. 2a shows a simplified decaying waveform of a plucked string. The overall amplitude envelope is quite
similar to that of Fig. lb, but the waveform itself also changes shape.
At the very beginning, the second harmonic is actually stronger than the fundamental. The second harmonic is responsible for the
crook in the waveform near the baseline. However, as the waveform decays, the second harmonic decays faster than the fundamental
and thus the crook gradually disappears. Eventually, the second harmonic fades out completely, leaving just a decaying sine wave.
This is reasonable behavior for a plucked string because high- frequency vibrations encounter greater losses in strings than
low-frequency ones do.
Fig. 2b shows one way of representing this behavior in meaningful terms. The solid line shows the amplitude envelope of the
fundamental, while the dotted line shows the envelope of the second harmonic. We can find out the harmonic composition of
the tone at any point in time by erecting a vertical scale at that point and reading off the amplitude of each harmonic as shown.
The same idea will work for any number of harmonics.
Now, how can we modify the tone generator routine described last month for varying waveforms? The secret is to arrange for the
waveform table address bytes, which are nor- mally constant, to change while the table scanning is taking place. Thus, while
the tone is sounding, the synthesis program is actually switching through a sequence of waveform tables. If the switching is
fairly rapid and the contrast between adjacent waveform tables is small, the audible effect is that of a smooth transition.
The idea is not unlike that of a sequence of image frames giving the illusion of smooth motion in a movie.
Fig. 3a illustrates this concept by showing the resulting stair-step approximation to the smooth harmonic envelopes in Fig. 2b.
In this example only eight waveform tables are used; in a practical situation it is common to use between 15 and 30 of them.
Fig. 3b shows the resulting waveform, which even for this coarse example bears a remarkable resemblance to the ideal case in Fig. 2a.
In the actual implementation of waveform table switching, the concept of a waveform sequence table is introduced. The waveform
sequence table is nothing more than a table of waveform table addresses. This extra level of indirection is very little problem
in a microprocessor such as the 6502, and it has many benefits.
While a note is sounding, a pointer scans through the sequence table at uniform speed just as the waveform pointer scans through
the waveform table, but more slowly. In the program implementation, the time equalization instructions are replaced with
instructions to move four pointers through their respective waveform sequence tables at a rate of one increment each time
register X (TEMPO) times out.
One advantage of using a sequence table is that waveform switching can be rapid when there is rapid change in the harmonic
envelopes and less rapid at other times, thus cutting down on the number of waveforms needed and memory usage. Another advantage
is that waveforms do not have to be stored in memory in the order that they are used. This allows such tricks as playing through
the attack sequence backwards for the decay sequence to save on memory.
Another trick is to cycle through a few waveforms during the sustain of a note to impart a sort of warble effect on notes.
A strumming effect can also be created in this manner. You can even construct several sequence tables for the same set of
waveforms to take care of differences in duration and articulation from note to note.
The results of adding waveform table sequencing to the earlier synthesis routine, which was done primarily by Frank Covitz,
are astounding. Attempts at simulating plucked string sounds result in a real plucked sound, and you can easily tell the
difference between a plucked string and a struck string (nqt possible without timbre envelopes!. Blown instruments sound blown,
and bowed instruments sound bowed. You can even get reasonably nice-sounding
bells, even though true bell tones are decidedly inharmonic and therefore cannot be duplicated by simple waveform table scanning.
Many of the instrument definitions (sets of harmonic envelopes) that have been experimented with are based on computer analyses
of musical instruments published in the Computer Music journal by James A. Moorer (see references)..
One particularly successful instrument simulation done by Cliff Ash- craft has been a piano. To cover the wide range of the piano,
it is necessary to define several instruments, one for each octave. This is because the quality of piano sound varies in different
pitch ranges due to differences in string construction and the fact that the sounding board has a finite mass. Music played with
his piano definitions is amazingly realistic, just like a real piano in the next room. Consult the references for a full description
of the system. "
This article is not primarily concerned with simulating existing musical instruments with a microcomputer. The real interest,
and future of computer music synthesis, is in dreaming up entirely new instrumental sounds and composing scores that complement
them.
Tone color as a musical variable is just as important as pitch and rhythm and may become more so, since pitch and rhythm
composition has been experimented with for centuries, whereas timbre composition has only recently been possible.
Convincing simulation of existing musical instruments is an important milestone because most conventional musical instruments
produce very complex sounds. Doing a good job on them implies the capability to begin exploring timbre space without a lot
of restrictions.
Delayed-Playback Digital Synthesis
While you can do amazing things with real-time software digital synthesis on a microcomputer, the compromises, shortcuts and
relatively low sample rates necessary leave something to be desired in the area of fidelity. The faster microprocessors that
are beginning to appear (both higher clock frequency standard units and the new 16-bit units) will certainly improve the
capability of real-time software synthesis. A 6502 running at 3 MHz, for example (which is currently available), could produce
eight voices at a 12 kHz sample rate for fidelity similar to good AM radio reception. However, there are still a number of musical
features missing which are needed for a truly versatile system for interest to the majority of musicians and listeners.
For example, bending notes {gradually changing their pitch), true vibrato, percussion instrument synthesis and singing voice
synthesis are all needed to penetrate the contemporary music idiom (perhaps this is why Bach is so often performed with computers).
With delayed playback, any or all of the compromises may be eliminated, the sample rate and DAC accuracy may be increased to true
hifi levels, and any desired musical feature that can be defined can be implemented.
Fig. 4 shows a block diagram of a delayed-playback software synthesis system as it might be implemented on a microcomputer.
Playing a musical selection is actually a three-step process.
In the first step a machine-readable score is entered or edited from a previous run. Typically, the score file on disk is just
a standard ASCII text file, so a standard text editing program is sufficient. In advanced systems other methods of score entry,
such as graphical input with a light pen, joystick or digitizer or even direct input from a music keyboard, are possible. In
any case, the result of the first step is an integrated score and instrument definition file on disk.
In the second step, a music interpreter program, which also contains all of the synthesis routines, reads the score file, carrws
out the indicated synthesis operations and writes a sound file on disk. While the majority of your work is spent creating and
editing the score file, the vast majority of machine 'work is spent computing the sound file.
Computing a minute of final sound may take anywhere from five minutes to whatever CPO time you can tolerate, depending on the
sample rate, number of simultaneous voices playipg and the sophistication of the synthesis techniques. Most of this time is
spent in arithmetic subroutines, so a microprocessor with automatic multiply (such as the 6809, 9900 and all of the new 16-bit units)
is a distinct advantage.
In the playback step, a highly specialized program reads the sound file from disk and sends the sound sam-ples to the DAC at a
uniform rate. When high-resolution DACs (ten bits or more) are used, the uniformity of sample rate becomes critical to minimize
jitter distortion. In order to achieve such uniformity while the program is also handling data readback from the sound file,
the DAC must generally be equipped with its own sample clock and at least one. level of data buffering.
A Delayed-Playback System
I implemented an experimental delayed-playback software digital synthesis system and demonstrated it at the PC '80 computer show
in Philadelphia this fall. It runs on the 6502-based KIM-l microcomputer equipped with 16K of RAM and a Micro Technology Unlimited
(MTU) disk controller, which adds another 16K. Two Siemens eight-inch floppy disk drives are used, and the double-density capability
of the MTU controller is utilized.
An experimental 12-bit digital-to- analog converter with an additional three bits of gain control is used to get a theoretical
dynamic range equivalent to a 16-bit DAC. The gain control is not yet utilized by the software, however. An important feature of
the experimental DAC is a 256 sample first-in-first-out buffer which allows the sample stream from the computer to be interrupted
for milliseconds at a time without affecting the smooth flow of data to the DAC itself.
When floppy disks are used to hold the sound file, the disk format is an important determinant of the maximum playback data rate.
While the normal CODOS disk operating system software (which is used to prepare the score file) uses the standard IBM disk format
of 26 sectors of 256 bytes each, the total diskette capacity is only about 512K bytes.
A different format consisting of 16 sectors of 512 bytes is used for the sound file and gives 630K bytes per disk, a 23 percent
increase in potential data rate and capacity. In order to read through the sound file at high speed., it is mandatory to be able
to read all of the sectors on a track in one revolution of the disk. In addition, you must be able to step to the next track
without waiting for a whole revolution before reading again. Staggering the sector numbers by three on adjacent tracks is utilized
to accomplish this. The resulting sustained average data rate from the disk can approach 40K bytes per second.
The actual playback program currently uses a 20 kHz sample rate with 12-bit samples for a total data rate of 30K bytes per second.
At this data rate, an eight-inch diskette holds about 21 seconds of sound. Going to double-sided disks would double the capacity
to 42 seconds. Minidisks have about half the capacity, but more important, only half the maximum data rate.
The synthesis and computation phase of a performance is relatively straightforward on the experimental system. The score file is
read from drive 0 using CODOS, and the sound file records are written onto drive 1 using a set of specialized disk driver routines.
When a sound disk is filled up, the synthesis program waits for a new disk to be inserted into drive 1.
When the playback program is called in, CODOS is disabled and the operator is expected to put the first sound disk in drive 0 and
the second one in drive 1. When playback starts, the first 21 seconds of sound are read from drive 0 and then an immediate,
inaudible switchover to drive 1 is performed. During the next 21 seconds, the operator must remove sound disk 1 from drive 0 and
insert disk 3 to be read when disk 2 is exhausted. You can switch back and forth like this indefinitely for music of any duration;
the performance at the PC' 80 show required 23 disks for eight minutes of sound.
The problem in using this system is not the disk jockeying required during playback but the changing of disks during computation.
With the music selected for performance, a new disk was required about every 15 to 30 minutes, which means that the computation
cannot be left to run overnight with any degree of benefit. Clearly, a 10 megabyte hard, disk would be advantageous here.
The experimental delayed synthesis program does about the same things as the real-time synthesis program mentioned earlier.
The major differences are an essentially unlimited number of voices, interpolation between waveform table entries and interpolation
between adjacent waveform tables in the sequence rather than sudden switching. It won't be considered complete until the musical
features described previously are implemented.
The Future
While these developments may seem exciting now, the future is likely to see many more exciting things happen in the field of music
synthesis on microcomputers. The sophisticated programmable synthesizer boards will undoubtedly become more sophisticated and
gradually come down in price. Today's square-wave synthesizer chips will probably be supplemented by programmable waveform
synthesizer chips that use direct memory access to automatically scan waveform tables in memory. The most exciting prospects
are in the software synthesis area, how- ever. The processors used in personal systems will gradually get faster at the
machine-language level, which will increase the capability and fidelity of real-time software synthesis. Even a simple step up
to 16 bits, which is inevitable, will nearly double the speed of the core sound routine, giving both more voices.and a higher
frequency range. Because of the very low cost of including a DAC in the circuitry of a computer most future systems will probably
contain built-in DACs.
On the delayed-playback front, experimental systems such as the one just described will reach full development and make it
possible to produce significant music of commercial value with microcomputers. Even the very general and powerful MUSIC-11
system (truly the ultimate in sound synthesis flexibility) has already been implemented on the LSI-11 microcomputer (used in
the HeathH11 and Terak systems), and it is only a matter of time before it is available for the more common microcomputers.
The decreasing cost and increasing' capacity of small hard disks will also. make using a delayed-playback type- of system much
more convenient and increase the fidelity even further..